BUGfix: Fix image_grid_thw `IndexError` in GRPOTrainer with Multimodal Models (Qwen3-VL) due to `None` Values in Chat Content by SolarWindRider · Pull Request #5364 · huggingface/trl

SolarWindRider · 2026-03-24T16:50:28Z

Fix `IndexError` in GRPOTrainer with Multimodal Models due to `None` Values in Chat Content

Summary

This PR fixes a critical bug in GRPOTrainer that causes training to fail completely when using multimodal models (Qwen3-VL) where chat messages contain content blocks with None values—a common pattern when datasets are processed by automated pipelines.

The Problem

Severity: 🔴 Critical (Training Blocker)

When training with GRPOTrainer on Qwen3-VL, I encounter this cryptic error:

  File "/home/ma-user/work/avr/train_grpo.py", line 98, in <module>
    trainer.train()
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.11/site-packages/transformers/trainer.py", line 1424, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.11/site-packages/transformers/trainer.py", line 1506, in _inner_training_loop
    self._run_epoch(
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.11/site-packages/transformers/trainer.py", line 1734, in _run_epoch
    tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ma-user/work/trl-my/trl/trainer/grpo_trainer.py", line 1083, in training_step
    output = super().training_step(model, inputs, num_items_in_batch)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.11/site-packages/transformers/trainer.py", line 1900, in training_step
    inputs = self._prepare_inputs(inputs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ma-user/work/trl-my/trl/extras/profiling.py", line 202, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ma-user/work/trl-my/trl/trainer/grpo_trainer.py", line 1112, in _prepare_inputs
    generation_batch = self._generate_and_score_completions(generation_batch)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ma-user/work/trl-my/trl/trainer/grpo_trainer.py", line 1766, in _generate_and_score_completions
    ) = self._generate(prompts)
        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ma-user/work/trl-my/trl/trainer/grpo_trainer.py", line 1614, in _generate
    prompt_ids, images, multimodal_fields = self._tokenize_prompts(prompts)
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ma-user/work/trl-my/trl/trainer/grpo_trainer.py", line 1261, in _tokenize_prompts
    tokenized = self.processing_class.apply_chat_template(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.11/site-packages/transformers/processing_utils.py", line 1829, in apply_chat_template
    out = self(
          ^^^^^
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.11/site-packages/transformers/models/qwen3_vl/processing_qwen3_vl.py", line 132, in __call__
    num_image_tokens = image_grid_thw[index].prod() // merge_length
                       ~~~~~~~~~~~~~~^^^^^^^
IndexError: index 2 is out of bounds for axis 0 with size 2
[ERROR] 2026-03-25-00:17:32 (PID:2357248, Device:0, RankID:-1) ERR99999 UNKNOWN applicaiton exception

This error occurs deep inside transformers' processing_qwen3_vl.py:

num_image_tokens = image_grid_thw[index].prod() // merge_length

Root Cause Analysis

The error message is misleading:

Surface Level: The crash happens in processing_qwen3_vl.py when accessing image_grid_thw[index]
Misleading: The stack trace suggests a image process bug
Truth: The actual issue is in chat template rendering within trl

The debugging journey:

Through breakpoint debugging, I traced the issue to the chat template rendering step.

transformers/utils/chat_template_utils.py
line 555
            rendered_chat = compiled_template.render(
                messages=chat,
                tools=tool_schemas,
                documents=documents,
                add_generation_prompt=add_generation_prompt,
                **kwargs,
            )

# Dataset automatically add keys with None value
print(chat)
[{'content': [{'image': None, 'text': 'You are good at step by step reasoning.', 'type': 'text'}], 'role': 'system'}, {'content': [{'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=560x168 at 0xFFFC18462D90>, 'text': None, 'type': 'image'}, {'image': None, 'text': '[Logical Reasoning]  \nThe left image shows the unfolded surface of a cube-shaped box. Which option can be folded into the cube depicted?option: A,B,C,D\nWrite the answer into a JSON form\n```json\n{"answer": "X"}```', 'type': 'text'}], 'role': 'user'}]


# rendered_chat miss information，causing error above.

print(rendered_chat)
<|im_start|>system
<|vision_start|><|image_pad|><|vision_end|><|im_end|>
<|im_start|>user
<|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|im_end|>
<|im_start|>assistant
<think>



# pop all the keys with None value
print(chat2)
[{'content': [{'text': 'You are good at step by step reasoning.', 'type': 'text'}], 'role': 'system'}, {'content': [{'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=560x168 at 0xFFFC18462D90>, 'type': 'image'}, {'text': '[Logical Reasoning]  \nThe left image shows the unfolded surface of a cube-shaped box. Which option can be folded into the cube depicted?option: A,B,C,D\nWrite the answer into a JSON form\n```json\n{"answer": "X"}```', 'type': 'text'}], 'role': 'user'}]

# to get correct result
print(rendered_chat2)
<|im_start|>system
You are good at step by step reasoning.<|im_end|>
<|im_start|>user
<|vision_start|><|image_pad|><|vision_end|>[Logical Reasoning]  
The left image shows the unfolded surface of a cube-shaped box. Which option can be folded into the cube depicted?option: A,B,C,D
Write the answer into a JSON form
```json
{"answer": "X"}```<|im_end|>
<|im_start|>assistant
<think>

The Fix

Filter out None values from content blocks before passing to apply_chat_template():

# Before: {'image': None, 'text': 'reasoning'} → <|image_pad|> (text lost!)
# After:  {'text': 'reasoning'}                 → "reasoning" (correct!)

Location: trl/trainer/grpo_trainer.py, line ~1709, in the input processing loop

This fix is minimal, surgical, and correct because the fix is placed at the exact location where prompts are processed, minimizing impact

Impact

Who it affects: Anyone using GRPOTrainer with VLM models (Qwen3-VL)
Why it's common: Automated data pipelines often produce None values in optional fields (e.g., {'image': None, 'text': '...', 'type': 'text'})
What it breaks: Complete training failure with no workaround without this fix

Testing

Verified fix resolves the Bug with Qwen3-VL-2B-Thinking

Note

Medium Risk
Touches the core GRPO training/eval generation path and changes the exact kwargs passed through inputs (including env reset kwargs), which could subtly affect datasets that rely on None placeholders.

Overview
Fixes multimodal GRPO training crashes by recursively removing None values from each sample in _generate_and_score_completions before building prompts and running environment resets.

This ensures chat-template rendering/tokenization doesn’t mis-handle None content blocks (e.g., VLM image/text parts), avoiding downstream processor errors like image_grid_thw index mismatches.

^{Written by Cursor Bugbot for commit 02a4d60. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-24T16:54:19Z

+            cleaned_item = remove_empty_fields(item)
+            cleaned_inputs.append(cleaned_item)
+            prompts.append(cleaned_item["prompt"])
+        inputs = cleaned_inputs


Broad None stripping removes top-level image keys breaking detection

High Severity

remove_empty_fields is applied to the entire input dict, not just the prompt content blocks. This strips top-level keys with None values, including "image". When inputs[0] has "image": None (a text-only sample in a mixed batch) but other inputs have actual images, the key is removed from inputs[0]. The subsequent check "image" in inputs[0] then fails, causing images = None and silently losing all images in the batch. The fix should only clean the nested prompt content, not the entire input dict.

Additional Locations (1)

trl/trainer/grpo_trainer.py#L1731-L1737

inputs[0]["image"]should be PIL and will no be removed by my changes.

print(inputs) [{'prompt': [{'content': [{'image': None, 'text': 'You are good at step by step reasoning.', 'type': 'text'}], 'role': 'system'}, {'content': [{'image': '../datas/VisuRiddles/images/sichuan/2021_59.png', 'text': None, 'type': 'image'}, {'image': None, 'text': '[Logical Reasoning] \nThe left image shows the unfolded surface of a cube-shaped box. Which option can be folded into the cube depicted?option: A,B,C,D\nWrite the answer into a JSON form\njson\n{"answer": "X"}', 'type': 'text'}], 'role': 'user'}], 'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=560x168 at 0xFFFBE5FEB5D0>, 'metadatas': {'gold_answer': 'A'}}, {'prompt': [{'content': [{'image': None, 'text': 'You are good at step by step reasoning.', 'type': 'text'}], 'role': 'system'}, {'content': [{'image': '../datas/VisuRiddles/images/sichuan/2021_59.png', 'text': None, 'type': 'image'}, {'image': None, 'text': '[Logical Reasoning] \nThe left image shows the unfolded surface of a cube-shaped box. Which option can be folded into the cube depicted?option: A,B,C,D\nWrite the answer into a JSON form\njson\n{"answer": "X"}', 'type': 'text'}], 'role': 'user'}], 'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=560x168 at 0xFFFBE5FEBC90>, 'metadatas': {'gold_answer': 'A'}}]

cursor · 2026-03-24T16:54:19Z

+            cleaned_item = remove_empty_fields(item)
+            cleaned_inputs.append(cleaned_item)
+            prompts.append(cleaned_item["prompt"])
+        inputs = cleaned_inputs


Fix not propagated to RLOO trainer's duplicated code

Medium Severity

The remove_empty_fields logic was added only to grpo_trainer.py but not to rloo_trainer.py, which has the same duplicated _generate_and_score_completions method with the identical prompts = [x["prompt"] for x in inputs] pattern. Per project rules, changes to duplicated logic across trainers must be applied consistently to all copies.

^{Triggered by project rule: BUGBOT.md}

I'm not sure if the subsequent execution flow and call stack of rloo_trainer.py are exactly the same as grpo_trainer.py. So, to be safe, I will only modify the grpo_trainer that has already been tested.

qgallouedec · 2026-03-25T16:46:49Z

Thanks, I understand the issue. However, I’m not convinced that TRL should support this kind of "polluted" dataset. It seems more appropriate for users to handle data cleaning upstream.

As a general rule of thumb, if this isn’t supported in Transformers (as indicated by the error), then it probably shouldn’t be supported in TRL either. Otherwise, we risk going down a slippery slope where supporting one such case leads to an endless stream of similar edge cases.

In this case the easiest is probably to map:

def clean_empty_images(example):
    for message in example["prompt"]:
        for element in message["content"]:
            if element["type"] == "text" and "image" in element:
                element.pop("image")
    return example

dataset = dataset.map(clean_empty_images)

@albertvillanova what do you think?

SolarWindRider · 2026-03-25T18:29:36Z

Thanks, I understand the issue. However, I’m not convinced that TRL should support this kind of "polluted" dataset. It seems more appropriate for users to handle data cleaning upstream.

As a general rule of thumb, if this isn’t supported in Transformers (as indicated by the error), then it probably shouldn’t be supported in TRL either. Otherwise, we risk going down a slippery slope where supporting one such case leads to an endless stream of similar edge cases.

In this case the easiest is probably to map:
def clean_empty_images(example):
    for message in example["prompt"]:
        for element in message["content"]:
            if element["type"] == "text" and "image" in element:
                element.pop("image")
    return example

dataset = dataset.map(clean_empty_images)
@albertvillanova what do you think?

Thank you for your reply !

Actually, the None value "pollution" is exactlly introduced by dataset.map(). Check this out.

from transformers import AutoProcessor
from datasets import Dataset

model_name_or_path = "/home/ma-user/work/Downloads/Models/Qwen/Qwen3-VL-2B-Thinking"
processor = AutoProcessor.from_pretrained(model_name_or_path)
full_question = """What's on the image ? """

samples = [[
    {"role": "system", "content": [{"type": "text", "text": "You are good at step by step reasoning."}]},
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": full_question},
        ],
    },
],[
    {"role": "system", "content": [{"type": "text", "text": "You are good at step by step reasoning."}]},
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": full_question,},
        ],
    },
]]

dataset = Dataset.from_list([
    {"prompt": s}
    for s in samples
])

def clean_empty_images(example):
    for message in example["prompt"]:
        for element in message["content"]:
            if element["type"] == "text" and "image" in element:
                element.pop("image")
    return example

dataset1 = dataset.map(clean_empty_images) # dataset.map() is actually the polution source

print(dataset1[0])
"""
{'prompt': [{'content': [{'image': None, 'text': 'You are good at step by step reasoning.', 'type': 'text'}], 'role': 'system'}, {'content': [{'image': 'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg', 'text': None, 'type': 'image'}, {'image': None, 'text': "What's on the image ? ", 'type': 'text'}], 'role': 'user'}]}
"""

Indeed, as a general rule of thumb, it would be better to 1) fix this dataset.map() to not generate None value keys; or 2) fix jinja2 to get correct tokenization tolerating None value keys. However, I’m not very familiar with these two libraries and am more comfortable with TRL. Given my limited expertise, the current code modification is the best solution I can come up with to help the community address this bug. If your engineers can fix dataset.map(), that would be even better for sure !

albertvillanova

Thanks for flagging, the investigation and the proposed fix, @SolarWindRider! And thanks for the ping, @qgallouedec, really appreciate it. 🤗

This is actually a known issue coming from datasets when mixed types introduce None values.

We've run into similar problems before and added a small utility (remove_none_values) to sanitize the inputs on our side, and used it for SFT and DPO:

trl/trl/trainer/dpo_trainer.py

Lines 866 to 869 in 9a29d28

    
           # Tabular backends like Arrow/Parquet insert `None` for mismatched keys in nested structures. Clean them from 
        
           # sampled data. 
        
           if isinstance(dataset, Dataset):  # IterableDataset does not support `with_transform` 
        
               dataset = dataset.with_transform(remove_none_values)

That said, I have good new: this is now properly addressed upstream by datasets! Recent versions of datasets provide the Json feature type along with on_mixed_types="use_json" during mapping, which avoids introducing these Nones in the first place (available since datasets>=4.7.0).

Given that, it might be cleaner to rely on the upstream fix rather than maintaining workarounds on our end. I’m thinking we could pin datasets to a compatible version: I’ll open a small PR for that so we can discuss.

SolarWindRider · 2026-03-26T08:50:33Z

Good to know! Im closing this PR.

fix: input None value

2c98629

cursor Bot reviewed Mar 24, 2026

View reviewed changes

Merge branch 'main' into qwen3vl

69b6086

Merge branch 'main' into qwen3vl

38455e3

Merge branch 'main' into qwen3vl

02a4d60

qgallouedec assigned albertvillanova Mar 25, 2026

albertvillanova reviewed Mar 26, 2026

View reviewed changes

SolarWindRider closed this Mar 26, 2026

albertvillanova mentioned this pull request Mar 26, 2026

Require datasets>=4.7.0 for Json dtype to prevent insertion of None values #5376

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUGfix: Fix image_grid_thw `IndexError` in GRPOTrainer with Multimodal Models (Qwen3-VL) due to `None` Values in Chat Content#5364

BUGfix: Fix image_grid_thw `IndexError` in GRPOTrainer with Multimodal Models (Qwen3-VL) due to `None` Values in Chat Content#5364
SolarWindRider wants to merge 4 commits into
huggingface:mainfrom
SolarWindRider:qwen3vl

SolarWindRider commented Mar 24, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Mar 24, 2026

Uh oh!

SolarWindRider Mar 25, 2026

Uh oh!

cursor Bot Mar 24, 2026

Uh oh!

SolarWindRider Mar 25, 2026

Uh oh!

qgallouedec commented Mar 25, 2026

Uh oh!

SolarWindRider commented Mar 25, 2026

Uh oh!

albertvillanova left a comment

Uh oh!

SolarWindRider commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	# Tabular backends like Arrow/Parquet insert `None` for mismatched keys in nested structures. Clean them from
	# sampled data.
	if isinstance(dataset, Dataset): # IterableDataset does not support `with_transform`
	dataset = dataset.with_transform(remove_none_values)

Conversation

SolarWindRider commented Mar 24, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix IndexError in GRPOTrainer with Multimodal Models due to None Values in Chat Content

Summary

The Problem

Root Cause Analysis

The Fix

Impact

Testing

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Mar 24, 2026

Choose a reason for hiding this comment

Broad None stripping removes top-level image keys breaking detection

Uh oh!

SolarWindRider Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot Mar 24, 2026

Choose a reason for hiding this comment

Fix not propagated to RLOO trainer's duplicated code

Uh oh!

SolarWindRider Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

qgallouedec commented Mar 25, 2026

Uh oh!

SolarWindRider commented Mar 25, 2026

Uh oh!

albertvillanova left a comment

Choose a reason for hiding this comment

Uh oh!

SolarWindRider commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SolarWindRider commented Mar 24, 2026 •

edited by cursor Bot

Loading

Fix `IndexError` in GRPOTrainer with Multimodal Models due to `None` Values in Chat Content